[Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image by yuanheng-zhao · Pull Request #2339 · vllm-project/vllm-omni

yuanheng-zhao · 2026-03-30T16:02:37Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR aims at supporting and validating layerwise CPU offloading for more diffusion models (or one of components of omni models).

Most of the work is about testing and verifying these models work with the feature. If there exist out-of-scope issues or special handling for specific models, we might resolve in another PR later.

Planing models and supported in this PR:

SD3.5, stabilityai/stable-diffusion-3.5-medium
Ovis-Image, AIDC-AI/Ovis-Image-7B
Nextstep_1, stepfun-ai/NextStep-1.1
LongCat-Image, meituan-longcat/LongCat-Image

Planning but not enabled in this PR:

GLM-Image - version mismatch
MammothModa2 - cannot run successfully on my side, ValueError: Tokenizer class MammothUTokenizer does not exist or is not currently imported.

Test Plan

Offline generations, refer to subsequent comments for detailed testing commands
#2339 (comment)

Test Result

Stats

*Tested on H100, single device
*Peak memory recording from DiffusionModelRunner._record_peak_memory

model \ feature	Peak Memory	Peak Memory Layerwise	Total gen time (seconds)	Total gen time Layerwise
stabilityai/stable-diffusion-3.5-medium	20.15 GB reserved, 18.00 GB allocated, 2.15 GB pool overhead (10.7%)	16.44 GB reserved, 14.05 GB allocated, 2.39 GB pool overhead (14.5%)	1.9754	4.8757
stepfun-ai/NextStep-1.1	29.58 GB reserved, 29.03 GB allocated, 0.55 GB pool overhead (1.9%)	5.96 GB reserved, 4.90 GB allocated, 1.06 GB pool overhead (17.8%)	73.7517	543.4749
AIDC-AI/Ovis-Image-7B	21.70 GB reserved, 19.51 GB allocated, 2.20 GB pool overhead (10.1%)	9.56 GB reserved, 6.62 GB allocated, 2.94 GB pool overhead (30.8%)	9.4076	26.6770
meituan-longcat/LongCat-Image	31.93 GB reserved, 29.67 GB allocated, 2.26 GB pool overhead (7.1%)	21.74 GB reserved, 18.74 GB allocated, 2.99 GB pool overhead (13.8%)	9.1991	24.6322

*Strongly not recommended to enable layerwise offloading on stepfun-ai/NextStep-1.1, as it's an AR with Diffusion heads model which runs multiple denoising steps for each of token generated (quite a lot of offloading happens)
*The total generation time increased for the above profiling when enabling the feature, I'm suspecting that for image gen, compute goes faster. This happened before on Qwen-Image image gen tasks: #858 (comment) ; We might want further profiling on specific devices.

Generated image comparison

model \ feature	offloading disabled	Layerwise offloading enabled
stabilityai/stable-diffusion-3.5-medium
AIDC-AI/Ovis-Image-7B
stepfun-ai/NextStep-1.1
meituan-longcat/LongCat-Image

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

yuanheng-zhao · 2026-04-06T06:57:28Z

Text to image example

stabilityai/stable-diffusion-3.5-medium

python examples/offline_inference/text_to_image/text_to_image.py \
	--model stabilityai/stable-diffusion-3.5-medium \
	--prompt "A serene mountain landscape at sunset" \
	--negative-prompt "blurry, low quality, distorted" \
	--guidance-scale 4.5 \
	--num-inference-steps 28 \
	--height 1024 \
	--width 1024 \
	--seed 42 \
	--output output_sd3_layerwise.png \
	--enable-layerwise-offload

python examples/offline_inference/text_to_image/text_to_image.py \
	--model stabilityai/stable-diffusion-3.5-medium \
	--prompt "A serene mountain landscape at sunset" \
	--negative-prompt "blurry, low quality, distorted" \
	--guidance-scale 4.5 \
	--num-inference-steps 28 \
	--height 1024 \
	--width 1024 \
	--seed 42 \
	--output output_sd3.png

stepfun-ai/NextStep-1.1

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model stepfun-ai/NextStep-1.1 \
	  --prompt "A baby panda wearing an Iron Man mask, holding a board with 'NextStep-1' written on it" \
	  --height 512 \
	  --width 512 \
	  --num-inference-steps 28 \
	  --guidance-scale 7.5 \
	  --guidance-scale-2 1.0 \
	  --cfg-schedule constant \
	  --seed 42 \
	  --output output_nextstep_layerwise.png \
	  --enable-layerwise-offload \
	  --init-timeout 1200 \
	  --stage-init-timeout 1200

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model stepfun-ai/NextStep-1.1 \
	  --prompt "A baby panda wearing an Iron Man mask, holding a board with 'NextStep-1' written on it" \
	  --height 512 \
	  --width 512 \
	  --num-inference-steps 28 \
	  --guidance-scale 7.5 \
	  --guidance-scale-2 1.0 \
	  --cfg-schedule constant \
	  --seed 42 \
	  --output output_nextstep.png

AIDC-AI/Ovis-Image-7B

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model AIDC-AI/Ovis-Image-7B \
	  --prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail." \
	  --height 1024 \
	  --width 1024 \
	  --num-inference-steps 50 \
	  --guidance-scale 5.0 \
	  --cfg-schedule constant \
	  --seed 42 \
	  --output output_ovis_image_layerwise.png \
	  --enable-layerwise-offload

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model AIDC-AI/Ovis-Image-7B \
	  --prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail." \
	  --height 1024 \
	  --width 1024 \
	  --num-inference-steps 50 \
	  --guidance-scale 5.0 \
	  --cfg-schedule constant \
	  --seed 42 \
	  --output output_ovis_image.png

meituan-longcat/LongCat-Image

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model meituan-longcat/LongCat-Image \
	  --prompt "一个年轻的亚裔女性，身穿黄色针织衫，搭配白色项链。她的双手放在膝盖上，表情恬静。背景是一堵粗糙的砖墙，午后的阳光温暖地洒在她身上，营造出一种宁静而温馨的氛围。镜头采用中距离视角，突出她的神态和服饰的细节。光线柔和地打在她的脸上，强调她的五官和饰品的质感，增加画面的层次感与亲和力。整个画面构图简洁，砖墙的纹理与阳光的光影效果相得益彰，突显出人物的优雅与从容。" \
	  --height 768 \
	  --width 1344 \
	  --num-inference-steps 50 \
	  --guidance-scale 4.0 \
	  --seed 42 \
	  --output output_longcat.png

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model meituan-longcat/LongCat-Image \
	  --prompt "一个年轻的亚裔女性，身穿黄色针织衫，搭配白色项链。她的双手放在膝盖上，表情恬静。背景是一堵粗糙的砖墙，午后的阳光温暖地洒在她身上，营造出一种宁静而温馨的氛围。镜头采用中距离视角，突出她的神态和服饰的细节。光线柔和地打在她的脸上，强调她的五官和饰品的质感，增加画面的层次感与亲和力。整个画面构图简洁，砖墙的纹理与阳光的光影效果相得益彰，突显出人物的优雅与从容。" \
	  --height 768 \
	  --width 1344 \
	  --num-inference-steps 50 \
	  --guidance-scale 4.0 \
	  --seed 42 \
	  --output output_longcat_layerwise.png \
	  --enable-layerwise-offload

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

chatgpt-codex-connector · 2026-04-06T13:10:02Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

yuanheng-zhao · 2026-04-07T02:59:42Z

PTAL @wtomin
cc @ZJY0516 @gcanlin

gcanlin

LGTM, please fix conflicts :)

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

…p_1, LongCat-Image (vllm-project#2339) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

…p_1, LongCat-Image (vllm-project#2339) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Signed-off-by: bob-021206 <binyan_github@163.com>

wtomin mentioned this pull request Apr 1, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

yuanheng-zhao force-pushed the feat/add-imagegen-layerwise branch from d9acf3b to 0e22230 Compare April 5, 2026 13:50

yuanheng-zhao added 4 commits April 6, 2026 04:01

[feat] add layerwise blocks for glm-image, nextstep_1, sd3

61ebaa0

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

upd doc (temp)

a5ee9e8

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>

text-to-image log info

5a1773f

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

enable for ovis image

e184a17

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

yuanheng-zhao force-pushed the feat/add-imagegen-layerwise branch from 0e22230 to e184a17 Compare April 6, 2026 06:22

apply new attrs (mutiple)

483d862

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

yuanheng-zhao changed the title ~~[WIP][Feat] Support Layerwise CPU offloading for more image-gen models~~ [WIP][Feat] Support Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image Apr 6, 2026

yuanheng-zhao added 2 commits April 6, 2026 08:54

upd

9c21354

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

revert glm image

29b9af8

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

yuanheng-zhao mentioned this pull request Apr 6, 2026

[Bugfix] Restore user config/runtime stage init timeout #2519

Merged

5 tasks

upd feature docs

77bf9c9

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

yuanheng-zhao changed the title ~~[WIP][Feat] Support Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image~~ [Feat] Support Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image Apr 6, 2026

yuanheng-zhao marked this pull request as ready for review April 6, 2026 13:09

yuanheng-zhao requested a review from hsliuustc0106 as a code owner April 6, 2026 13:09

yuanheng-zhao changed the title ~~[Feat] Support Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image~~ [Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image Apr 6, 2026

gcanlin approved these changes Apr 7, 2026

View reviewed changes

gcanlin added ready label to trigger buildkite CI and removed ready label to trigger buildkite CI labels Apr 7, 2026

Merge from main

2406b90

Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>

hsliuustc0106 added the ready label to trigger buildkite CI label Apr 7, 2026

wtomin merged commit 8a55d3d into vllm-project:main Apr 8, 2026
8 checks passed

yuanheng-zhao deleted the feat/add-imagegen-layerwise branch April 8, 2026 02:11

NickCao mentioned this pull request Apr 9, 2026

[Refactor] Let diffusion pipelines declare offloadable modules via SupportsModuleOffload #2427

Merged

5 tasks

BBuf mentioned this pull request Apr 20, 2026

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 BBuf/how-to-optim-algorithm-in-cuda#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image#2339

[Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image#2339
wtomin merged 9 commits intovllm-project:mainfrom
yuanheng-zhao:feat/add-imagegen-layerwise

yuanheng-zhao commented Mar 30, 2026 •

edited

Loading

Uh oh!

yuanheng-zhao commented Apr 6, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Apr 6, 2026

Uh oh!

yuanheng-zhao commented Apr 7, 2026 •

edited

Loading

Uh oh!

gcanlin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yuanheng-zhao commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Planing models and supported in this PR:

Planning but not enabled in this PR:

Test Plan

Test Result

Stats

Generated image comparison

Uh oh!

yuanheng-zhao commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Text to image example

stabilityai/stable-diffusion-3.5-medium

stepfun-ai/NextStep-1.1

AIDC-AI/Ovis-Image-7B

meituan-longcat/LongCat-Image

Uh oh!

chatgpt-codex-connector Bot commented Apr 6, 2026

Uh oh!

yuanheng-zhao commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yuanheng-zhao commented Mar 30, 2026 •

edited

Loading

yuanheng-zhao commented Apr 6, 2026 •

edited

Loading

yuanheng-zhao commented Apr 7, 2026 •

edited

Loading